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ABSTRACT 

Photometric redshift estimation is becoming an increasingly important 
technique, although the currently existing methods present several shortcomings 
which hinder their application. Here it is shown that most of those drawbacks 
are efficiently eliminated when Bayesian probability is consistently applied 
to this problem. The use of prior probabilities and Bayesian marginalization 
allows the inclusion of valuable information, e.g. the redshift distributions or 
the galaxy type mix, which is often ignored by other methods. It is possible to 
quantify the accuracy of the redshift estimation in a way with no equivalents 
in other statistical approaches; this property permits the selection of galaxy 
samples for which the redshift estimation is extremely reliable. In those cases 
when the a priori information is insufficient, it is shown how to ‘calibrate’ the 
prior distributions, using even the data under consideration. 

There is an excellent agreement between the ~ 100 HDF spectroscopic 
redshifts and the predictions of the method, with a rms error ^z/{l+Zgpec) = 0.08 
up to 2 ; < 6 and no systematic biases nor outliers. Note that these results 
have not been reached by minimizing the difference between spectroscopic and 
photometric redshifts (as is the case with empirical training set techniques), 
which may lead to an overestimation of the accuracy. The reliability of the 
method is further tested by restricting the color information to the UBVI filters. 
The results thus obtained are shown to be more reliable than those of standard 
techniques even when the latter include near-IR colors. 

The Bayesian formalism developed here can be generalized to deal with 
a wide range of problems which make use of photometric redshifts. Several 
applications are outlined, e.g. the estimation of individual galaxy characteristics 
as the metallicity, dust content, etc., or the study of galaxy evolution and the 
cosmological parameters from large multicolor surveys. Finally, using Bayesian 
probability it is possible to develop an integrated statistical method for cluster 
mass reconstruction which simultaneously considers the information provided 
by gravitational lensing and photometric redshift estimation. 

Subject headings: photometric redshifts; galaxy evolution; statistical methods; 
gravitational lensing 
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1. Introduction 

The advent of the new class of 10-ni ground based telescopes is having a strong impact 
on the study of galaxy evolution. For instance, instruments as LRIS at the Keck allow 
observers to regularly secure redshifts for dozens of / ~ 24 galaxies in several hours of 
exposure. Technical advances in the instrumentation, combined with the proliferation of 
similar telescopes in the next years guarantees a vast increase in the number of galaxies, 
bright and faint, for which spectroscopical redshifts will be obtained in the near future. 
Notwithstanding this progress in the sheer numbers of available spectra, the / 24 ‘barrier’ 

(for reasonably complete samples) is likely to stand for a time, as there are not foreseeable 
dramatic improvements in the telescope area or detection techniques. 

Despite the recent spectacular hndings of very high redshift galaxies, (|Dey et al. 
19981 , IFranx et al. 19971 [Frye, Broadhurst fc Benitez 1998|) , it is extremely difficult to 
secure redshifts for such objects. On the other hand, even moderately deep ground based 
imaging routinely contain many high redshift galaxies (although hidden amongst myriads 
of foreground ones), not to mention the Hubble Deep Field or the images that will be 
available with the upcoming Advanced Camera. To push further in redshift the study of 
galaxy evolution is therefore very important to develop techniques able to extract galaxy 
redshifts from multicolor photometry data. 

This paper applies the methods of Bayesian probability theory to photometric redshift 
estimation. Despite the efforts of Thomas Loredo, who has written stimulating reviews 
on the subject (Loredo 1990, 1992), Bayesian methods are still far from being one of the 
staple statistical techniques in Astrophysics. Most courses and monographs on Statistics 
only include a small section on Bayes’ theorem, and perhaps as a consequence of that, 
Bayesian techniques are frequently used ad hoc, as another tool from the available panoply 
of statistical methods. However, as any reader of the fundamental treatise by E.T. Jaynes 
(1998) can learn, Bayesian probability theory represents an unihed look to probability and 
statistics, which does not intend to complement, but to fully substitute the traditional, 
‘frequentist’ statistical techniques (see also Bretthorst 1988, 1990) 

One of the fundamental differences between ‘orthodox’ statistics and Bayesian theory, 
is that the probability is not dehned as a frequency of occurrence, but as a reasonable degree 
of belief. Bayesian probability theory is developed as a rigorous full-flegded alternative 
to traditional probability and statistics based on this dehnition and three desiderata: 
a)Degrees of belief should be represented by real numbers, b)One should reason consistently, 
and c)The theory should reduce to Aristotelian logic when the truth values of hypothesis 
are known. 
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One of the most attractive features of Bayesian inference lies on its simplicity. There 
are two basic rules to manipulate probability, the product rule 

P{A,B\C) = P{A\C)P{B\AC) (1) 

and the sum rule 

P{A + B\C) = P{A\C) + P{B\C) - P{A, B\C) (2) 

where “A, B’’’ means “A and B are true”, and “A + S” means “either A or i? or both are 
true”. From the product rule, and taking into account that the propositions “A, S” and 
“B, A” are identical, it is straightforward to derive Bayes’ theorem 


P{A\B,C) 


P{A\C)P{B\A,C) 

P{B\C) 


(3) 


If the set of proposals B = {B^} are mutually exclusive and exhaustive, using the sum 
rule one can write 

P(A, B\C) = P{A, {Bj}|C) = P(A, B,\C) (4) 

i 

which is known as Bayesian marginalization. These are the basic tools of Bayesian inference. 
Properly used and combined with the rules to assign prior probabilities, they are in principle 
enough to solve most statistical problems. 


There are several differences between the methodology presented in this paper and that 
of |Kodama, Bell fc Bower 1998|, the most significant being the treatment of priors (see Sec. 

The procedures developed here offer a major improvement in the redshift estimation 
and based on them it is possible to generate new statistical methods for applications which 
make use of photometric redshifts (Sec. H). 


The outlay of the paper is the following: Sec. 2 reviews the current methods of 
photometric redshifts estimation making emphasis on their main sources of error. Sec. 

3 introduces an expression for the redshift likelihood slightly different from the one used 
by other groups when applying the SED-fitting technique. In Sec. 4 it is described in 
detail how to apply Bayesian probability to photometric redshift estimation; the resulting 
method is called BPZ. Sec 5 compares the performance of traditional statistical techniques, 
as maximum likelihood, with BPZ by applying both methods to the HDF spectroscopic 
sample and to a simulated catalog. Sec. 6 briefly describes how BPZ may be developed 
to deal with problems in galaxy evolution and cosmology which make use of photometric 
redshifts. Sec 7 briefly summarizes the main conclusions of the paper. 
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2. Photometric redshifts: training set vs. SED fitting methods 


There are two basic approaches to photometric redshift estimation. Using the 
terminology of |Yee 1998| , they may be termed ‘SED htting’ and ‘empirical training set’ 
methods. The hrst technique (|Koo 1985| , [Lanzetta, Yahil fc Fernandez-Soto 1996|, IGwyn fc| 
Hartwick 1996| , [Pello et al. 1996| , [Sawicki, Lin, fc Yee 1997| , etc.) involves compiling a library 
of template spectra, empirical or generated with population synthesis techniques . These 
templates, after being redshifted and corrected for intergalactic extinction, are compared 
with the galaxy colors to determine the redshift z which best hts the observations. The 
training set technique ([Brunner et al. 1997|, [Connolly et al. 1995|, [Wang, Bahcall fc 'iurnei 


1998) starts with a multicolor galaxy sample with apparent magnitudes mg and colors C 


which has been spectroscopically identihed. Using this sample, a relationship of the kind 
z = z{C,m) is determined using a multiparametric ht. 


It should be said that these two methods are more similar than what it is usually 
considered. To understand this, let’s analyze how the empirical training set method works. 
For simplicity, let’s forget about the magnitude dependence and let’s suppose that only 
two colors C = (Ci,C 2 ) are enough to estimate the photometric redshifts, that is, given 
a set of spectroscopic redshifts {zgpec} and colors {U}, the training set method tries to 
£t a surface 2 ; = z{C) to the data. It must be realized that this method makes a very 
strong assumption, namely that the surface z = z{C) is a function dehned on the color 
space: each value of C is assigned one and only one redshift. Visually this means that 
the surface z = z{C) does not ‘bend’ over itself in the redshift direction. Although this 
functionality of the redshift/color relationship cannot be taken for granted in the general 
case (at faint magnitudes there are numerous examples of galaxies with very similar colors 
but totally different redshifts), it seems to be a good approximation to the real picture at 
2 ; < 1 redshifts and bright magnitudes ( [Brunner et al. 1997[ ). A certain scatter around this 
surface is allowed: galaxies with the same value of (U) may have slightly different redshifts 
and it seems to be assumed implicitly that this scatter is what limits the accuracy of the 
method. 


The SED htting method is based on the color/redshift relationships generated by each 
of the library templates T, Ct = Ct{z). A galaxy at the position C is assigned the redshift 
corresponding to the closest point of any of the Ct curves in the color space. If these Ct 
functions are inverted, one ends up with the curves zt = zt{Ct)-i which, in general, are not 
functions; they may present self-crossings (and of course they may also cross each other). If 
we limit ourselves to the region in the color/redshift space in which the training set method 
dehnes the surface 2 ; = z{C), for a realistic template set the curves zt = zt{Ct) would be 
embedded in the surface z = z{C), conforming its ‘skeleton’ and dehning its main features. 
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The fact that the surface z = z{C) is continuous, whereas the template-dehned curves 
are sparsely distributed, does not have a great practical difference. The gaps may be hlled 
by hnely interpolating between the templates ([Sawicki, Lin, fc Yee 19971) , but this is not 
strictly necessary: usually the statistical procedure employed to search for the best redshift 
performs its own interpolation between templates. When the colors of a galaxy do not 
exactly coincide with one of the templates, or the maximum likelihood method will assign 
the redshift corresponding to the nearest template in the color space. This is equivalent 
to the curves zt = zt{Ct) having extended ‘influence areas’ around them, which conform 
a sort of step-like surface which interpolates across the gaps, and also extends beyond the 
region limited by them in the color space. Therefore, the SED-htting method comes with a 
built-in interpolation (and extrapolation) procedure. For this reason, the accuracy of the 
photometric redshifts does not change dramatically when using a sparse template set as the 
one of [Coleman, Wu, fc Weedman 1980| ( jhanzetta, Yahil fc Eernandez-Soto 1996|) or a hne 
grid of template spectra ( jSawicki, Lin, &: Yee 1997] ). The most crucial factor is that the 
template library, even if it contains few spectra, adequately reflects the main features of 
real galaxy spectra and therefore the main ‘geographical accidents’ of the surface = z{C) 


The intrinsic similarity between both photometric redshift methods explains their 
comparable performance, especially at z ^ 1 redshift ( [Hogg et ah 1998|) . When the topology 
of the color-redshift relationship is simple, as apparently happens at low redshift, the 
training set method will probably work slightly better than the template htting procedure, 
if only because it avoids the possible systematics due to mismatches between the predicted 
template colors and the real ones, and also partially because it includes not only the 
colors of the galaxies, but also their magnitudes, what helps to break the color/redshift 
degeneracies (see below). However, it must be kept in mind that although the hts to the 
spectroscopic redshifts give only a dispersion 6z ~ 0.06 ([Connolly et al. 1997]) , there is not 
a strong guarantee that the predictive capabilities of the training set method will keep such 
an accuracy, even within the same magnitude and redshift ranges. As a matter of fact, 
they do not seem to work spectacularly better than the SED htting techniques ([Hogg et al. 
[1998[ ), even at low and intermediate redshifts. 

However, the main drawback of the training set method is that, due to its empirical and 
ad hoc basis, in principle it can only be reliably extended as far as the spectroscopic redshift 
limit. Because of this, it may represent a cheaper method of obtaining redshifts than the 
spectrograph, but which cannot really go much fainter than it. Besides it is difficult to 
transfer the information obtained with a given set of hlters, to another survey which uses 
a different set. Such an extrapolation has to be done with the help of templates, what 
makes the method lose its empirical purity. And last but not least, it is obvious that as one 
goes to higher redshifts/fainter magnitudes the topology of the color-redshift distribution 
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z = z{C, mo) displays several nasty degeneracies, even if the near-IR information is included, 
and it is impossible to fit a single functional form to the color-redshift relationship. 


Although the SED htting method is not affected by some of these limitations, it also 
comes with its own set of problems. Several authors have analyzed in detail the main 
sources of errors affecting this method (|Sawicki, Lin, fc Yee 1997i|Fernandez-Soto, Lanzetta 
fc Yahil 1998|) . These errors may be divided into two broad classes: 


2 . 1 . 


Color/redshift 


degeneracies 



I-K 



I-K 


Fig. 1.— a) On the left, VI vs. IK for the templates used in Sec § in the interval 1 < z < 5. 
The size of the hlled squares grows with redshift, from z = 1 to z = 5. If these were the 
only colors used for the redshift estimation every crossing of the lines would correspond to 
a color/redshift degeneracy, b) To the right, the same color-color relationships ‘thickened’ 
by a 0.2 photometric error. The probability of color/redshift degeneracies highly increases. 

Fig, |a, shows VI vs IK for the morphological types employed in Sec ^ and 0 < z < 5. 
The color/redshift degeneracies happen when the line corresponding to a single template 
intersects itself or when two lines cross each other at points corresponding to different 
redshifts for each of them (these cases correspond to “bendings” in the redshift/color 
relationship 2 ; = z{C)). It is obvious that the likelihood of such crossings increases with the 
extension of the considered redshift range and the number of templates included. 
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It may seem that even considering a very extended redshift range, such confusions 
could in principle be easily avoided by using enough hlters. However, the presence of 
color/redshift degeneracies is highly increased by random photometric errors, which can 
be visualized as a blurring or thickening of the Ct{zt) relationship (£g. 0b): each point 
of the curves in £g. 0a is expanded into a square of size 5C, the error in the measured 
color. The hrst consequence of this is a ‘continuous’ {5z ~ ^5C) increase in the rms of 
the ‘small-scale’ errors in the redshift estimation, and, what it is worse, the overlaps in 
the color-color space become more frequent, with the corresponding rise in the number of 
‘catastrophic’ redshift errors. In addition, multicolor information may often be degenerate, 
so increasing the number of hlters does not break the degeneracies; for instance, by applying 
a simple PCA analysis to the photometric data of the HDF spectroscopic sample it can be 
shown that the information contained in the seven UBVIJHK hlters for the HDF galaxies 
can be condensed using only three parameters, the coefficients of the principal components 
of the hux vectors (see also |(Jonnolly et al. 1995|) . Therefore, if the photometric errors are 
large, it is not always possible to get totally rid of the degeneracies even increasing the 
number of hlters. This means that the presence of color/redshift degeneracies is unavoidable 
for faint galaxy samples. The training set method somehow alleviates this problem by 
introducing an additional parameter in the estimation, the magnitude, which in some cases 
may break the degeneracy. However, it is obvious that color/redshift degeneracies also ahect 
galaxies with the same magnitude, and the training set method does not even contemplate 
the possibility of their existence! 

The SED-htting method at least allows for the existence of this problem, although 
it is not extremely efficient in dealing with it, especially with noisy data. Its choice of 
redshift is exclusively based on the goodness of ht between the observed colors and the 
templates. In cases as the one described above, where two or more redshift/morphological 
type combinations have practically the same colors, the value of the likelihood C would 
have two or more approximately equally high maxima at diherent redshifts (see hg. 
Depending on the random photometric error, one maximum would prevail over the others, 
and a small change in the hux could involve a catastrophic change in the estimated redshift 
(see hg. 0). However, in many cases there is additional information, discarded by ML, 
which could potentially help to solve such conundrums. For instance, it may be known from 
previous experience that one of the possible redshift/type combinations is much more likely 
than any other given the galaxy magnitude, angular size, shape, etc. In that case, and 
since the likelihoods are not informative enough, it seems clear that the more reasonable 
decision would be to choose the option which is more likely a priori as the best estimate. 
Plain common sense dictates that one should compare all the possible hypotheses with the 
data, as ML does, but simultaneously keeping in mind the degrees of plausibility assigned 




to them by previous experience. There is not a simple way of doing this within ML, at best 
one may remove or change the redshift of the problematic objects by hand or devise ad 
hoc solutions for each case. In contrast, Bayesian probability theory allows to include this 
additional information in a rigorous and consistent way, effectively dealing with this kind of 
errors (see Sec |) 


2.2. Template incompleteness 

In some cases, the spectra of observed galaxies have no close equivalents in the template 
library. Such galaxies will be assigned the redshift corresponding to the nearest template 
in the color/redshift space, no matter how distant from the observed color it is in absolute 
terms. The solution is obvious, one has to include enough templates in the library so that 
all the possible galaxy types are considered. As was explained above, the SED htting 
techniques perform their own ‘automatic’ interpolation and extrapolation, so once the main 
spectral types are included in the template library, the results are not greatly affected if one 
hnely interpolates among the main spectra. The effects of using a correct but incomplete 
set of spectra are shown in Sec ^ 

Both sources of errors described above are exacerbated at high redshifts. High redshift 
galaxies are usually faint, therefore with large photometric errors, and as the color/redshift 
space has a very extended range in 2 ;, the degeneracies are more likely; in addition the 
template incompleteness is worsened as there are few or no empirical spectra with which 
compare the template library. 

The accuracy of any photometric redshift technique is usually established by contrasting 
its output with a sample of galaxies with spectroscopic redshifts. It should be kept in 
mind, though, that the results of this comparison may be misleading, as the available 
spectroscopic samples are almost ‘by dehnition’ especially well suited for photometric 
redshift estimation: relatively bright (and thus with small photometric errors) and often 
hlling a privileged niche in the color-redshift space, far from degeneracies (e.g. Lyman-break 
galaxies). Thus, it is risky to extrapolate the accuracy reached by current methods as 
estimated from spectroscopic samples (and this also applies to BPZ) to fainter magnitudes. 
This is especially true for the training set methods, which deliberately minimize the 
difference between the spectroscopic and photometric redshifts. 
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3. Maximum likelihood (ML) redshift estimates 

Photometric redshift techniques based on template htting look for the best estimate of 
a galaxy redshift from the comparison of its measured fluxes in nc+ 1 hlters {fa}, « = 0, Uc, 
with a set of nr template spectra which try to represent the different morphological types, 
and which have fluxes /ro(^)- These methods hnd their estimate zml by maximizing the 
likelihood C (or equivalently minimizing over all the possible values of the redshift 2 ;, 
the templates T and the normalization constant oq. 

- log(/:) + const oc x^{z, T, ao) = ^ (/« (5) 

Since the normalization constant oq is considered a free parameter, the only information 
relevant to the redshift determination is contained in the ratios among the fluxes {fa}, that 
is, in the galaxy colors. 

The dehnition of the likelihood in eq. (^ is not convenient for applying Bayesian 
methods as it depends on a normalization parameter oq, which is not convenient to dehne 
useful priors either theoretically or from previous observations. Here we prefer to normalize 
the total fluxes in each band by the flux in a ‘base’ hlter, e.g. the one corresponding to the 
band in which the galaxy sample was selected and is considered to be complete. Then the 
‘colors’ C = {cj}, are dehned as Cj = fi/fo i = l,nc, where /o is the base flux. The exact 
way in which the colors are dehned is not relevant, other combinations of hlters are equally 
valid. Hereinafter the magnitude mo (corresponding to the hux /o) will be used instead of 
/o in the expressions for the priors. And so, assuming that the magnitude errors are 

gaussianly distributed, the likelihood can be dehned as 


where 


C{T,z) 


1 


= II - CTi{z)][Cj - CTj{z)] 


and the matrix of moments Aij =< <Jci<ycj > can be expressed as 


Aq = /o 


( 6 ) 

(7) 

( 8 ) 


By normalizing by /o instead of oq, one reduces the computational burden as it is not 
necessary to maximize over /o, which is already the ‘maximum likelihood’ estimate for the 
value of the galaxy hux in that hlter. It is obvious that this assumes that the errors in the 





colors are gaussian, which in general is not the case, even if the flux errors are. Fortunately, 
the practical test performed below (Sec. ^ shows that there is little change between the 
results using both likelihood dehnitions (see £g. 0a). 


4. Bayesian photometric redshifts (BPZ) 

Within the framework of Bayesian probability, the problem of photometric redshift 
estimation can be posed as hnding the probability p{z\D,I), i.e., the probability of a 
galaxy having redshift z given the data D = {C,mQ}, and the prior information I. As 
it was mentioned in the introduction, Bayesian theory states that all the probabilities are 
conditional; they do not represent frequencies, but states of knowledge about hypothesis, 
and therefore always depend on other data or information (for a detailed discussion of 
this and many other interesting issues see Jaynes, 1998). The prior information I is an 
ample term which in general should include any knowledge that may be relevant to the 
hypothesis under consideration and is not already included in the data C, mo. Note that 
in Bayesian probability the relationship between the prior and posterior information is 
logical] it does not have to be temporal or even causal. For instance, data from a new 
observation may be included as prior information to estimate the photometric redshifts of 
an old data set. Although some authors recommend that the ‘|/' should not be dropped 
from the expressions of probability (as a remainder of the fact that all probabilities are 
conditional and especially to avoid confusions when two probabilities based on different 
prior informations are considered as equal), here the rule of simplifying the mathematical 
notation whenever there is no danger of confusion will be followed, and from now p{z) will 
stand for p{z\I), p{D\z) for p{D\z, I) etc. 

As a trivial example of the application of Bayes’s theorem, let’s consider the case if 
which there is only one template and the likelihood C only depends on the redshift z. Then, 
applying Bayes theorem 

p{z\C, mo) = 0 ^ p{z\mo)p{C\z) (9) 

p{C) 

The expression p{C\z) = £ is simply the likelihood: the probability of observing the colors 
C if the galaxy has redshift z (it is assumed for simplicity that C only depends on the 
redshift and morphological type, and not on mo) The probability p{C) is a normalization 
constant, and usually there is no need to calculate it. 

The first factor, the prior probability p{z\mQ), is the redshift distribution for galaxies 
with magnitude mo. This function allows to include information as the existence of upper 
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or lower limits on the galaxy redshifts, the presence of a cluster in the held, etc. The effect 
of the prior p{z\mo) on the estimation depends on how informative it is. It is obvious 
that for a constant prior (all redshifts equally likely a priori) the estimate obtained from 
eq. will exactly coincide with the ML result. This is also roughly true if the prior is 
‘smooth’ enough and does not present signihcant structure. However, in other cases, values 
of the redshifts which are considered very improbable from the prior information would be 
“discriminated”; they must ht the data much better than any other redshift in order to be 
selected. 

Note that in rigor, one should write the prior in eq. (^) as 

p{z\mo) = j drhQp{rho)p{rnQ\rhQ)p{z\rhQ) (10) 

where tuq is the ‘true’ value of the observed magnitude mo, p(mo) is proportional to the 
number counts as a function of the magnitude mo and p(mo|mo) cx: exp[(mo — mo)^/2cT^(,], 
i.e, the probability of observing mo if the true magnitude is mo- The above convolution 
accounts for the uncertainty in the value of the magnitude mo, which has the effect of 
slightly ‘blurring’ and biasing the redshift distribution p(z|mo). To simplify our exposition 
this effect would not be consider hereinafter, and just p{z\mQ) and its equivalents will be 
used. 


4.1. Bayesian Marginalization 

It may seem from eq. ^ (and unfortunately it is quite a widespread misconception) 
that the only difference between Bayesian and ML estimates is the introduction of a prior, 
in this case, p{z\mQ). However, there is more to Bayesian probability than just priors. 

The galaxy under consideration may belong to different morphological types represented 
by a set of ut templates. This set is considered to be exhaustive, i.e including all possible 
types, and exclusive-, the galaxy cannot belong to two types at the same time. In that 
case, using Bayesian marginalization (eq. ^) the probability p{z\D) can be ‘expanded’ into 
a ‘basis’ formed by the hypothesis p{z,T\D) (the probability of the galaxy redshift being 
z and the galaxy type being T). The sum over all these ‘atomic’ hypothesis will give the 
total probability p{z\D). That is, 

p{z\C, mo) = ^p{z, T\C, mo) oc ^p{z, T\mo)p{C\z, T) (11) 

T T 

p{C\z,T) is the likelihood of the data C given 2 ; and T. The prior p(^, T|mo) may be 
developed using the product rule. For instance 

p{z,T\mQ) = p{T\mo)p{z\T,mQ) 


( 12 ) 
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where p(T|mo) is the galaxy type fraction as a function of magnitude and p(z|T, mg) is the 
redshift distribution for galaxies of a given spectral type and magnitude. 



0 12 3 4 
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Fig. 2.— An example of the main probability distributions involved in BPZ for a galaxy at 
2 ; = 0.28 with an Irr spectral type and / 26 to which random photometric noise is added. 

From top to bottom: a) The likelihood functions p{C\z,T) for the different templates used 
in Sec Based on ML, the redshift chosen for this galaxy would be zml = 2.685 and its 
spectral type would correspond to a Spiral, b) The prior probabilities p(z,T|mo) for each 
of the spectral types (see text). Note that the probability of hnding a Spiral spectral type 
with z > 2.5 and a magnitude J = 26 is almost negligible, c) The probability distributions 
p(z, TIC, mo) oc p{z,T\mQ)p{C\z,T) , that is, the likelihoods in the top plot multiplied by 
the priors. The high redshift peak due to the Spiral has disappeared, although there is 
still a little chance of the galaxy being at high redshift if it has a Irr spectrum, but the 
main concentration of probability is now at low redshift. d) The hnal Bayesian probability 
p(2;|C, mo) = mo), which has its maximum at zb = 0.305. The shaded area 

corresponds to the value of paz, which estimates the reliability of zb and yields a value of 
0.91. 

Eq. (|IT|) and hg. I clearly illustrate the main differences between the Bayesian and 
ML methods. ML would just pick the highest maximum over all the p{C\z,T) as the best 
redshift estimate, without looking at the plausibility of the corresponding values of z or T. 
On the contrary, Bayesian probability averages all these likelihood functions after weighting 
them by their prior probabilities p{z,T\mo). In this way the estimation is not affected by 
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spurious likelihood peaks caused by noise as it is shown in £g. (see also the results of 
Sec. H). Of course that in an ideal situation with perfect, noiseless observations (and a 
nondegenerate template space, i.e, only one C for each {z,T) pair) the results obtained 
with ML and Bayesian inference would be the same. 

Instead of a discrete set of templates, the comparison library may contain spectra which 
are a function of continuous parameters. For instance, synthetic spectral templates depend 
on the metallicity Z, the dust content, the star formation history, etc. Even starting from 
a set of a few templates, they may be expanded using the principal component analysis 
(PCA) technique (|Sodre &: Cuevas 1997|) . In general, if the spectra are characterized by ha 
possible parameters A = {ai...a„^} (which may be physical characteristics of the models or 
just PCA coefficients), the probability of z given F can be expressed as 

p{z\C,mQ) = J dAp{z, A\C,mo) (X J dAp{z, A\mo)p{C\z, A) (13) 


4.2. ‘Bookmaker’ odds 


Sometimes, instead of finding a ‘point’ estimate for a galaxy redshift, one needs to 
establish if that redshift belongs within a certain interval. For instance, the problem may 
be to determine whether the galaxy has z > Zt, where Zt is a given threshold, or whether its 
redshift falls within a given z ± Az, e.g. in the selection of cluster members or background 
galaxies for leasing studies. 


As an example, let’s consider the classihcation of galaxies into the background- 
foreground classes with respect to a certain redshift threshold Zth- One must choose between 
the hypothesis Hth = {z > Zth} and its opposite, Hth = {z < Zth}- The corresponding 
probabilities may be written as 


r^th 

P{Hth\D) = / dzp{z\D) 
Jo 


(14) 


And 

_ /*CXD 

P(Ah\D) = / dzp(z\D) 


(16) 


The (‘bookmaker’) odds of hypothesis Hth are defined as the probability of Hth being true 
over the probability of Hth being false (Jaynes 1998) 


0{Hth\D) 


P{Hth\D) 

P{Hth\D) 


(16) 


When 0{Hth\D) k. 1, there is not enough information to choose between both hypothesis. 
A galaxy is considered to have 2 ; > Zth if 0{Hth\D) > Od, where Od is a certain decision 
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threshold. There are no hxed rules to choose the value of and the most appropriate 
value depends on the task at hand; for instance, to be really sure that no foreground galaxy 
has sneaked into the background sample, Od would have to be high, but if the main goal 
is selecting all the background galaxies and one does not mind including some foreground 
ones, then Od would be lower, etc. Basically this is a problem concerning decision theory. 

In the same way, the cluster galaxies can be selected by choosing a redshift threshold 
/S.Z which dehnes whether a galaxy belongs to the cluster. The corresponding hypothesis 
would he He = {\z — Zcj < Az}. 

P{He\D)= dzp{z\D) (17) 

J Zc — ^Z 

And 

_ rZc—Az poo 

P{He\D= / dzp{z\D)+ / dzp{z\D) (18) 

Jo J Zc-\-Az 

Similarly, the odds of are dehned as 


0{He\D) 


P{He\D) 

P{He\D 


(19) 


4.3. Prior calibration 


In those cases where the prior information is vague and does not allow to choose a 
dehnite expression prior probability, Bayesian inference offers the possibility of “calibrating” 
the prior, if needed using the very data sample under consideration. 

Let’s suppose that the distribution p{z,T,mo) is parametrized using nx continuous 
parameters A. They may be the coefficients of a polynomial £t, a wavelet expansion, etc. 
In that case, including A in eq. (^), the probability can be written as 

p{z,T, X\C,mo) oc J dXp{X) 

where p(A) is the prior probability of A, and p{z, T, mo|A) is the prior probability of z, T and 
mo as a function of the parameters A. The latter have not been included in the likelihood 
expression since C is totally determined once the values of and T are known. 

Now let’s suppose that the galaxy belongs to a sample containing Ug galaxies. 

Each j—th galaxy has a ‘base’ magnitude moj and colors Cj. The sets C = {Cj} and 
mo = contain respectively the colors and magnitudes of all the galaxies 


^p(2;,T,mo|A)p(C|z,T) (20) 


p{z\C,mo) = J dXj2 
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in the sample. Then, the probability of the i—th galaxy having redshift Zi given the full 
sample data C and mo can be written as 

p{zi\C,mo) = j dA^p(2:i,T, A|C'i,moi,C',mo) (21) 


The sets C' = {Cj} and m'g = {moj}, j = l,ng,j ^ i are identical to C and mo except 
for the exclusion of the data Q and mon Applying Bayes’ theorem, the product rule and 
simplifying 

p(zi|C,mo)oc y dAp(A|C',mo) ^p(zi,T,moi|A)p(C|z^,T) (22) 

where as before it has been considered that the likelihood of C, only depends on Zj, T and 
that the probability of Zi and T only depend on C and m^j through A. The expression to 
which we arrived is very similar to eq. only that now the shape of the prior is estimated 
from the data C', m^j. This means that even if one starts with a very sketchy idea about 
the shape of the prior, the very galaxy sample under study can be used to determine the 
value of the parameters A, and thus to provide a more accurate estimate of the individual 
galaxy characteristics. Assuming that the data C' (as well as mo) are independent among 
themselves 

p(A|C',mo) ocp(A)p(C',mo|A) = p(A) H p{Cj,moj\X) (23) 


where 


p{Cj,moj\\) = / dzj E p{zj,Tj, Cj, moj I A) oc / dzj Y,p{zj, Tj, moj\X)p{C\zj, Tj) ( 24 ) 

T, Ti 


If the number of galaxies in our sample is large enough, it can be reasonably assumed 
that the prior probability p(A|C',mQ) will not change appreciably with the inclusion of 
the data Ci,mQi belonging to a single galaxy. In that case, a time-saving approximation is 
to use as a prior the probability p(A|C,mo), calculated using the whole data set, instead 
of finding p(A|C',mo) for each galaxy. In addition, it should be noted that p(A|C,mo) 
represents the Bayesian estimate of the parameters which define the shape of the redshift 
distribution (see fig. m- 


4.4. Including spectroscopical information 

In some cases spectroscopical redshifts {zsi} are available for a fraction of the galaxy 
sample. It is straightforward to include them in the prior calibration procedure described 
above, using a delta-function likelihood weighted by the probability of the galaxy belonging 
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to a morphological type, as it is done to determine the priors in Sec This gives the 
spectroscopical subsample a (deserved) larger weight in the determination of the redshift 
and morphological priors in comparison with the rest of the galaxies, at least within a 
certain color and magnitude region, but, unlike what happens with the training set method, 
the information contained in the rest of the sample is not thrown away. 

If nevertheless one wants to follow the training set approach and use only the 
spectroscopic sample, it is easy to develop a Bayesian variant of this method. As before, 
the goal is to hnd an expression of the sort p(z|C, mo), which would give us the redshift 
probability for a galaxy given its colors and magnitude. If the color/magnitude/redshift 
multidimensional surface z = z{C,mQ) were inhnitely thin, the probability would just be 
p{z\C,mQ) = S{z{C,mo)), where 5(...) is a delta-function. But in the real world there 
is always some scatter around the surface dehned by z{C,mQ) (even without taking into 
account the color/redshift degeneracies), and it is therefore more appropriate to describe 
p{z\C,mo) as e.g. a gaussian of width az centered on each point of the surface z{C,mo). 
Let’s assume that all the parameters which dehne the shape of this relationship, together 
with az are included in the set A^. Using the prior calibration method introduced above, 
the probability distribution for these parameters p{Xz\Drp) can be determined from the 
training set Dt = {zsi,Ci,Tnoi}. 

p{X^\Dt) (X p{X^)Y[p{zsi\Ci,moi, Xz) (25) 

i 


The expression for the redshift probability of a galaxy with colors C and mo would 
then be 

p{z\C,mo) = j dX^p{X^\DT)p{z\C,mo,X^) (26) 

The redshift probability obtained from eq. (p6D is compatible with the one obtained 
in eq. ( [TI|) using the SED-htting procedure. Therefore it is possible to combine them in 
a same expression. As an approximation, let’s suppose that both of them are given equal 
weights, then 

p{z\C, mo) oc ^p{z, T\mQ)p{C\z, T) + J dX^p{X^\D)p{z\C, m, A^) (27) 

In fact, due to the above described redundancy between the SED-htting method and 
the training set method (Sec. H), it would be more appropriate to combine both probabilities 
using weights which would take these redundancies into account in a consistent way, roughly 
using eq.(p6D at brighter magnitudes, where the galaxies are well studied spectroscopically 
and leaving eq.(pdj) for fainter magnitudes. The exploration of this combined, training 
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set/SED-fitting approach will be left for a future paper, and in the practical tests performed 
below the followed procedure uses the SED-htting likelihood. 


5. A practical test for BPZ 



^spee 


^spec 


Fig. 3.— a)To the left, the photometric redshifts obtained by applying our ML algorithm to 
the HDF spectroscopic sample using a template library which contains only the four CWW 
main types, E/SO, Sbc, Scd and Irr. These results are very similar to those of Fernandez- 
Soto, Lanzetta & Yahil, 1998. b) The right plot shows the signihcant improvement (without 
using BPZ yet) obtained by just including two of the Kinney et ah 1996 spectra of starburst 
galaxies, SB2 and SB3, in the template set. One of the outliers disappears, the ‘sagging’ 
or systematic offset between 1.5 < z < 3.5 is eliminated and the general scatter of the 
relationship decreases from ^z/{l + Zgpec) = 0.13 to Az/(1 + Zgpec) = 0.10. 


The Hubble Deep Field (HDF; [Williams et ah 1996|) has become the benchmark in 
the development of photometric redshift techniques. In this section BPZ will be applied 
to the HDF and its performance contrasted with the results obtained with the standard 
‘frequentist’ (in the Bayesian terminology) method, the procedure usually applied to the 
HDF (jGwyn fc Hartwick 1996|,[Lanzetta, Yahil fc Fernandez-Soto 1996 , Sawicki, Lin, fc Yee 
19971, etc.). The photometry used for the HDF is that of [Fernandez-Soto, Lanzetta fc Yahil] 


1998, which, in addition to magnitudes in the four HDF hlters includes JHK magnitudes 
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from the observations of Dickinson et al. 1998 . /814 is chosen as the base magnitnde. The 
colors are dehned as described in Sec. |^. 

The template library was selected after several tests with the HDF snbsample which 
has spectroscopic redshifts (108 galaxies), spanning the range z < 6. The set of spectra 
which worked best is similar to that nsed by [Sawicki, Lin, fc Yee lOQTl . It contains fonr 
Coleman, Wn, fc Weedman 1980| templates (E/SO, Sbc, Scd, Irr), that is the same spectral 
types nsed by [lernandez-Soto, Lanzetta fc Yahil 1998| , pins the spectra of 2 starbnrsting 
galaxies from [Kinney et al. 19^(|Sawicki, Lin, fc Yee 1997| nsed two very bine SEDs from 
GISSEL). All the spectra were extended to the UV using a linear extrapolation and a cutoff 
at 912 A, and to the near-IR using GISSEL synthetic templates. The spectra are corrected 
for intergalactic absorption following jlVfadau 1995|. 



Fig. 4.— The prior in redshift p(z|mo) estimated from the HDF data using the prior 
calibration procedure described in Sec 4., for different values of the magnitude mo (/814 = 21 
to /814 = 28) 


It could seem in principle that a synthetic template set which takes (at least tentatively) 
into account galaxy evolution is more appropriate than ‘frozen’ template library obtained 
at low redshift and then extrapolated to very high redshifts. However, as [Yee 1998| has 
convincingly shown, the extended GWW set offers much better results than the GISSEL 
synthetic models Pruzual fc Gharlot 1993] . I have also tried to use the RVF set of spectra 
and the agreement with the spectroscopic redshifts is considerably worse than using the 
empirical template set. And if the synthetic models do not work well within the magnitude 
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range corresponding to the HDF spectroscopic sample is relatively bright, there is little 
reason to suppose that their performance will improve at fainter magnitudes. 

However, even working with empirical templates, it is important to be sure that the 
template library is complete enough. Fig. ^ illustrates the effects of template incompleteness 
in the redshift estimation. The left plot displays the results obtained using ML (Sec ^ 
redshift estimation using only the four CWW templates (this plot is very similar to the 
Zphot — Zspec diagram shown in [Fernandez-Soto, Lanzetta fc Yahil 1998| , which conhrms the 
validity of the expression for the likelihood introduced in Sec ^). On the right, the results 
obtained also using ML (no BPZ yet) but including two more templates, SB2 and SB3 from 
Kinney et al. 19^. It can be seen that the new templates almost do not affect the low 


redshift range, but the changes at z > 2 are quite dramatic, the ‘sagging’ of the CWW-only 
diagram disappears and the general scatter of the diagram decreases by 20%. This shows 
how important it is to include enough galaxy types in the template library. No matter how 
sophisticated the statistical treatment is, it will do little to improve the results obtained 
with a dehcient template set. 

The hrst step in the application of BPZ is choosing the shape of the priors. Due to the 
depth of the HDF there is little previous information about the redshift priors, so this is a 
good example in which the prior calibration procedure described in Sec|| has to be applied. 
It will be assumed that the early types (E/SO) and spirals (Sbc,Scd) have a spectral type 
prior (eq. |T^ ) of the form 

p(T|mo) = (28) 

with t = 1 for early types and t = 2 for spirals. The irregulars (the remaining three 
templates; t = 3) complete the galaxy mix. The fraction of early types at J = 20 is assumed 
to be /i = 35% and that of spirals /2 = 50%. The parameters ki and ^2 are left as free. 
Based on the result from redshift surveys the following shape for the redshift prior has been 
chosen: 


p{z\T,mo) oc z°‘*exp{ — [ 




(29) 


Zmt[mo) 

where 

= ZQt + kmt{mo - 20.) (30) 

and at, Zqi and kt are considered free parameters. In total, 11 parameters have to be 
determined using the calibration procedure. For those objects with spectroscopic redshifts, 
a ‘delta-function’ located at the spectroscopic redshift of the galaxy has been used instead 
of the likelihood p{C\z,T). Table 1 shows the values of the ‘best’ values of the parameters 
in eq. (^ , ^0]) found by maximizing the probability in eq. ([2^) using the subroutine amoeba 
([Press et al. 199^. The errors roughly indicate the parameter range which encloses 66% 


of the probability. The values of the parameters in eq. (p8[) are kl = 0.47 ± 0.02 and 
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k2 = 0.165 ± 0.01. The prior in redshift p{z\mo) can obviously be found by summing over 
the ‘nuisance’ parameter (Jaynes 1998), in this case T: 

p{z\mo) = J2p{T\mo)p{z\T,mo) (31) 

T 

Fig. ^ plots this prior for different magnitudes. 

With the priors thus found, one can proceed with the redshift estimation using eq. 
(^). Here the multiplication by the probability distribution p(A) and the integration over 
d\ will be skipped. As it can be seen from Table 1, the uncertainties in the parameters 
are rather small and it is obvious that the results would not change appreciably, so the 
additional computational effort of performing a 11-dimensional integral is not justihed. 

There are several options to convert the continuous probability p{z\C,mQ) to a point 
estimate of the ‘best’ redshift zb- Here the ‘mode’ of the hnal probability is chosen, 
although taking the ‘median’ value of z, corresponding to 50% of the cumulative probability, 
or even the ‘average’ < z >= / dzzp{z\C,mo) is also valid. 





Fig. 5.— The photometric redshifts obtained with BPZ plotted against the spectroscopic 
redshifts. The differences with £g. |b are the elimination of 3 galaxies with pAz < 0.99 
(see text). This removes the only outlier present in £g. The rms scatter around the 
continuous line is Azs/(1 J- zb) = 0.08. 

It was mentioned in sec ^ that bayesian probability offers a way to characterize the 
accuracy of the redshift estimation using the odds or a similar indicator, for instance by 
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analogy with the gaussian distribution a ‘la’ error may be defined using a interval with 
contains 66% of the integral of p(z|C, mg) around zb, etc. Here it has been chosen as an 
indicator of the redshift reliability the quantity paz, the probability oi\z — zb\ < Az, where 
is the galaxy redshift. In this way, when the value of paz is low, we are warned that the 
redshift prediction is unreliable. As it will be shown below, paz is extremely efficient in 
picking out galaxies with ‘catastrophic errors’ in their redshifts. The photometric redshifts 
resulting from applying BPZ to the spectroscopic sample are plotted in fig. Galaxies 
with a probability paz < 0.99 (there are three of them) have been discarded, where Az is 
chosen to be 0.2 x (1 + z), to take into account that the uncertainty grows with the redshift 
of the galaxies. 


It is evident from fig. ^ that the agreement is very good at all redshifts. The residuals 


Azb = Zb — Zspec have < Azb >= 0.002. If Azb is divided by a factor (1 + Zspec)^ 


as 


suggested in [Fernandez- Soto, Lanzetta fc Yahil 1998| , the rms of the quantity Azb/{1 + zb) 
is only 0.08. There are no appreciable systematic effects in the residuals. One of the three 
objects discarded because of their having paz < 0.99 is the only clear outlier in our ML 
estimation, with zbpz = 0.245 and Zgpec = 2.93 (see fig. |b), evidence of the usefulness of 
Paz to generate a reliable sample. 


From the comparison of fig. |p3 with fig. |^, it may seem that, apart from the exclusion 
of the outlier, there is not much profit in applying BPZ with respect to ML. This is not 
surprising in the particular case of the HDF spectroscopic sample, which is formed mostly 
by galaxies either very bright or occupying privileged regions in the color space. The 
corresponding likelihood peaks are thus rather sharp, and little affected by smooth prior 
probabilities. 


To illustrate the effectiveness of BPZ under worse than ideal conditions, the photometric 
redshifts for the spectroscopic sample are estimated again using ML and BPZ but restricting 
the color information to the UBVI HST filters. The results are plotted in fig. p. The ML 
redshift diagram displays 5 ‘catastrophic errors’ {Az ^ 1). Note that these are the same 
kind of errors pointed out by [Ellis 1997] in the first HDF photometric redshifts estimations. 
BPZ with a paz > 0.99 threshold (which eliminates a total of 7 galaxies) totally eliminates 
those outliers. This is a clear example of the capabilities of BPZ (combined with an 
adequate template set) to obtain reliable photometric redshift estimates. Note that even 
using near-IR colors, the ML estimates shown in fig. ^ presented outliers. This shows that 
applying BPZ to UV-only data may yield results more reliable than those obtained with 
ML including near-IR information! Although of course no more accurate; the scatter of fig. 
|b, once the outliers are removed is Az ~ 0.18, whereas fig. |b has a scatter of Az ~ 0.24, 
which incidentally is approximately the scatter of fig. |^. 
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Fig. 6.— a) The left plot shows the results of applying ML to the HDF spectroscopic 
sample using only the four HST bands. Compare with £g. |b, which uses also the near IR 
photometry of Dickinson et ah 1998. The rms of the diagram is increased, and there are 
several outliers, b) The right plot shows how applying BPZ with a threshold paz > 0.99 leaves 
the remaining 101 objects (93.5% of the total) virtually free of outliers. It is noteworthy 
that these results are totally comparable or even better (as there are no outliers) than those 
obtained in fig. ^a, in which the near-IR magnitudes were included in the estimation. 


Another obvious way of testing the efficiency of BPZ is with a simulated sample. 

The latter can be generated using the procedure described in [Fernandez-Soto, Lanzetta] 
fc Yahil 1998| . Each galaxy in the HDF is assigned a redshift and type using ML (this 
is done deliberately to avoid biasing the test towards BPZ) and then a mock catalog is 
created containing the colors corresponding to the best htting redshifts and templates. To 
represent the photometric errors present in observations, a random photometric noise of the 
same amplitude as the photometric error is added to each object. Fig. 0b shows the ML 
estimated redshifts for the mock catalog (/ < 28) against the ‘true’ redshifts; although in 
general the agreement is not bad (as could be expected) there are a large number of outliers 
(10%), whose positions illustrate the main source of color/redshift degeneracies: high z 
galaxies which are erroneously assigned ^ ^ 1 redshifts and vice versa. This shortcoming of 
the ML method is analyzed in detail in [Fernandez-Soto, Lanzetta fc Yahil 1998 . In contrast, 
fig- 0a shows the results of applying BPZ with a threshold of paz > 0.9. This eliminates 
20 % of the initial sample (almost half of which have catastrophically wrong redshifts), but 
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the number of outliers is reduced to a remarkable 1%. 

Is it possible to define some ‘reliability estimator’, similar to p^z within the ML 
framework? The obvious choice seems to be Fig. Pd plots the value of vs. the ML 
redshift error for the mock catalog. It is clear that is almost useless to pick out the 
outliers. The dashed line marks the upper 25% quartile in most of the outliers are below 
it, at smaller x^ values. In stark contrast, £g. j^a plots the errors in the BPZ redshifts vs. 
the values of paz- The lower 25% quartile, under the dashed line, contains practically all the 
outliers. By setting an appropriate threshold one can virtually eliminate the ‘catastrophic 
errors’. 






Fig. 7.— a) To the left, the photometric redshifts zb estimated using BPZ for the / < 28 HDF 
mock catalog, plotted against the ‘true’ redshifts Zt (see text). A threshold of paz > 0.90, 
which eliminates 20% of the objects has been applied, b) The right plot shows the results 
obtained applying ML to the same mock sample. The fraction of outliers is 10%). 

Fig. I shows the numbers of galaxies above a given paz threshold in the HDF as a 
function of magnitude and redshifts. It shows how risky it is to estimate photometric 
redshifts using ML for faint, / ;^ 27 objects; the fraction of objects with possible catastrophic 
errors grows steadily with magnitude. 

There is one caveat regarding the use of paz or similar quantities as a reliability 
estimator. They provide a safety check against the color/redshift degeneracies, since 
basically they tell us if there are other probability peaks comparable to the highest one. 
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but they cannot protect us from template incompleteness. If the template library does not 
contain any spectra similar to the one corresponding to the galaxy, there is no indicator 
able to warn us about the unreliability of the prediction. Because of this, no matter how 
sophisticated the statistical methods become, it is fundamental to have a good template 
set, which contains—even if only approximately—all the possible galaxy types present in 
the sample. 


20 

15 
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Fig. 8.— a) On the left, the probability p^z plotted against the absolute value of the 
difference between the ‘true’ redshift (zt) and the one estimated using BPZ (zb) for the 
mock sample described in Sec. |^. The higher the value of paz, the more reliable the redshift 
should be. The dashed line shows the 25% low quartile in the value of paz- Most of the 
outliers are at low values of paz, what allows to eliminate them by setting a suitable threshold 
on paz (see text and fig. |^) b) The right plot shows that it is not possible to do something 
similar using ML redshifts and as an estimator. The value of of the best ML £t is 
plotted against the error in the ML redshift estimation \zt — ZML\- The dotted line shows the 
25% high quartile in the values of One would expect that low values of (and therefore 
better hts) would correspond to more reliable redshifts, but this obviously is not the case. 
This is not surprising: the outliers in this hgure are all due to color/reshifts degeneracies 
as the one displayed in £g. 2, which may give an extremely good fit to the colors C, but a 
totally wrong redshift. 

Finally, £g. shows the redshift distributions for the HDF galaxies with I < 27. No 
objects have been removed on the basis of paz, so the values of the histogram bins should 
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be taken with care. The overplotted continous curves are the distributions used as priors 
and which simultaneously are the Bayesian fits to the final redshift distributions. The 
results obtained from the HDF will be analyzed in more detail, using a revised photometry, 
in a forthcoming paper. 




Fig. 9.— a) On the left, histograms showing the number of galaxies over pAz thresholds 
of 0.90 and 0.99 as a function of magnitude. It can be seen that the reliability of the 
photometric redshift estimation quickly degrades with the magnitude, b) The same as a) 
but as a function of redshift. 


6. Applications 


As we argue above, the use of BPZ for photometric redshift estimation offers obvious 
advantages over standard ML techniques. However, quite often obtaining photometric 
redshifts is not an end in itself, but an intermediate step towards measuring other quantities, 
like the evolution of the star formation rate (IConnolly et ah 1997|) , the galaxy-galaxy 
correlation function ( Ponnolly, Szalay, fc Brunner 19^ , |Miralles fc Pello 1998|) , galaxy 
or cluster mass distributions ([Hudson et al. 1998|) , etc. The usual procedure consists 
in obtaining photometric redshifts for all the galaxies in the sample, using ML or the 
training set method, and then work with them as if these estimates were accurate, reliable 
spectroscopic redshifts. The results of the previous sections alert us to the dangers inherent 
in that approach, as it hardly takes into account the uncertainties involved in photometric 
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Fig. 10.— The zb redshift distributions for the I < 27 HDF galaxies divided by spectral 
types. The solid lines represent the corresponding p{z,T) distributions estimated using the 
prior calibration method described in the text. 


redshift estimation. In contrast, within the Bayesian framework there is no need to work 
with the discrete, point-like ‘best’ redshift estimates. The whole redshift probability 
distribution can be taken into account, so that the uncertainties in the redshift estimation 
are accounted for in the hnal result. To illustrate this point, let’s outline how BPZ can be 
applied to several problems which use photometric redshift estimation. 


6.1. Spectral properties of a galaxy population 


If, instead of working with a discrete set of templates, one uses a spectral library whose 
templates depend of parameters as the metallicity, the star-formation history, initial mass 
function, etc., represented by A in Sec it is obvious from equation (0 that the same 
technique used to estimate the redshift can be applied to estimate any of the parameters 
A which characterize the galaxy spectrnm. For instance, let’s suppose that one want to 
estimate the parameter a*. Then dehning A' = {%}, j ^ i, we have 


p{ai\C,mo) = J dz J dA'p(ai, z, A'\C,mo) oc j dz j dA'p{ai, z, A'\mQ)p{C\z, A) (32) 
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That is, the likelihoods p{C\z, A) and the weights p(aj, z, A') = A) are the same ones 
used for the redshift estimation (eq. 0), only that now the integration is performed over 
the variables z and A' instead of A. In this way, depending on the template library which 
is being used, one can estimate galaxy characteristics as the metallicity, dust content, etc. 
An important advantage of this method over ML is that the estimates of the parameter 
automatically include the uncertainty of the redshift estimation, which is reflected in the 
hnal value of p{ai\C,mQ). Besides, by integrating the probability over all the parameters 
A', one precisely includes the uncertainties caused by possible parameter degeneracies in 
the hnal result for Oj. It should also be noted that as many of the results obtained in this 
paper, this method can be almost straightforwardly applied to spectroscopical observations; 
one has only to modify the likelihood expression which compares the observed huxes with 
the spectral template. The rest of the formalism remains practically identical. 


6.2. Galaxy Clusters: member identification 


One frequent application of photometric redshift techniques is the study of galaxy 
cluster helds. The goals may be the selection of cluster galaxies to characterize their 
properties, especially at high redshifts, or the identihcation of distant, background galaxies 
to be used in gravitational lensing analysis ( Benitez fc Broadhurst 1998 ). BPZ offers an 
effective way of dealing with such problems. 


To simplify the problem, the effects of gravitational lensing on the background galaxies 
(magnihcation, number counts depletion, etc.) will be neglected (see however the next 
subsection). Let’s suppose that we already have an estimate of the projected surface density 
of cluster galaxies (which can roughly be obtained without any photometric redshifts, just 
from the number counts surface distribution) nc(mo, r), where r is the position with respect 
to the cluster center. The surface density of ‘held’, non-cluster galaxies is represented by 
n/(mo). For each galaxy in the sample we know its magnitude and colors mQ,C and also 
its position r, which is now a relevant parameter in the redshift estimation. Following eq. 
(Q) we can write 

p{z\C,mo,r} (x'^p{z,T\mo,f^piC\z,T) (33) 

T 

A dependence on the magnitude (e.g. for the early types cluster sequence) could easily be 
included in the likelihood p{C\z, T) if needed. The prior can be divided into the sum of two 
different terms: 

p{z,T\mo,f^ = Pc{z,T\mo,r) + Pf{z,T\mQ,f) (34) 

where Pc represents the prior probability of the galaxy belonging to the cluster, whereas pf 
corresponds to the prior probability of the galaxy belonging to the general held population. 



The expression for can be written as 


Pc{z, T\mo, f) = - -p^{T\mo)g{zc, aZc) (35) 

nc{mo,r) + n/(mo) 

The probability Pc{T\mQ) corresponds to the expected galaxy mix fraction in the 
cluster, which in general will depend on the magnitude and will be different from that 
of held galaxies. The function g{zc, crzc) is the redshift prohle of the cluster; a good 
approximation could be a gaussian with a width corresponding to the cluster velocity 
dispersion. 

The second prior takes the form 

Pf{z, T|mo, f) = ——- -pf{T\mo)pf{z\T, mo) (36) 

nc[mo, r) + nf[mo) 

which uses the priors for the general held galaxy population (Sec Finally, the 
hypothesis that the galaxy belongs to the cluster or not can be decided about with the help 
of a properly dehned or with the odds introduced in Sec ||. 


6.2.1. Cluster detection 


We have assumed above that the cluster redshift and its galaxy surface density 
distribution are known. However, in some cases, there is a reasonable suspicion about the 
presence of a cluster at a certain redshift, but not total certainty, and our goal is to conhrm 
its existence. An example using ML photometric redshift estimation is shown in [Fello et 
|al. 1996| . An extreme case with minimal prior information occur in optical cluster surveys, 
galaxy catalogs covering large areas of the sky are searched for clusters. In those cases 
there are no previous guesses about the position or redshift of the cluster, and a ‘blind’, 
automatized search algorithm has to be used (Postman et al. 1996|) . 


The prior expression used in the previous subsection offers a way to build such a 
searching method. Instead of assuming that the cluster redshift and its surface distribution 
are known, the redshift can be left as a free parameter Zc and the expression characterizing 
the cluster galaxy surface density distribution nc{mo,r) can be parametrized using the 
quantities Ac. For simplicity, let’s suppose that 


nc(mo, r) = Ac0(mo, Zc)f(rc, avc) (37) 

where Ac is the cluster ‘amplitude’, Zc) is the number counts distribution expected 

for the cluster (which in general will depend on the redshift Zc) and f{rc, avc) represents 
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the cluster profile, centered on fc and with a scale width of avc- This expression, except 
for the dependence on the redshift is very similar to that used by [Postman et ah 1996| to 
dehne their likelihood. Then for a multicolor galaxy sample with data mo, C and r, the 
probability 

p(Ac,fc,arc,Zc,aZc\nio,C,r) (38) 

can be developed analogously to how it was done in Sec. The probability assigned 
to the existence of a cluster at a certain redshift and position may be simply dehned as 
p{Ac\zc,rc) > 0 . 


6.3. Cluster lensing 


It seems that the most obvious application of BPZ to cluster lensing analysis is the 
selection of background galaxies with the technique described in the previous subsection in 
order to apply the standard cluster mass reconstruction techniques ([Kaiser fc bquires 1993 , 
[Broadhurst 1995[ ^eitz, Schneider, fc Bartelmann 19^ , [Taylor et al. 199^) . However, using 
Bayesian probability it is possible to develop an unihed approach which simultaneously 
considers the lensing and photometric information in an optimal way. 

In a simplihed fashion, the problem of determining the mass distribution of a galaxy 
cluster from observables can be stated as hnding the probability 

p(Am, Ac|e,r, mo,C, Ag) (39) 


where Am represent the parameters which describe the cluster mass distribution; their 
number may range from a few, if the cluster is described with a simplihed analytical 
model or as many as wanted if the mass distribution is characterized by e.g. Fourier 
coefficients ([Squires fc Kaiser 1996[ ). Xq represents the cosmological parameters, which 
sensitively affect the lensing strength. The parameter set Ag represents the properties of the 
background galaxy population which affect the lensing, as its redshift distribution, number 
counts slope, etc. and it is assumed to be known previously. The data e correspond to the 
galaxy ellipticities, r to their angular positions. As above, mo, C correspond to their colors 
and magnitudes. For simplicity, it will be assumed that the cluster and foreground galaxies 
have been already removed and we are dealing only with the background galaxy population. 


Analogous to eq. (^3[ ), we can develop eq. (^^ as 


p{Xm, AgIg, r, mo, C, Ag) oc p{Xm)p{Xc) H 

i 


dZip(Ci,mtx,f,,ei,Zi\\M, Ac, Ac) 


(40) 
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where the last factor may be written as 

p{Ci, moi, fi, Ci, ZilXu, Ac, Ac) oc p{ei\zi, fl, Am, Xc)p{rl\moi, Am, Ac, XG)p{zi\C, moi, fl, Am, Ac, Ac) 

(41) 

The meaning of the three factors on the right side of the equation is the following: p{ei\...) 
represents the likelihood of measuring a certain ellipticity e* in a galaxy given its redshift, 
position, etc. The second factor p(?y|...) corresponds to the so called “Broadhurst effect”, 
the number counts depletion of background galaxies caused by the cluster magnihcation 
fi (Broadhurst 1995, Benitez & Broadhurst 1998). The last factor, p{zi\...) is the redshift 
probability, but including a correction which takes into account that the observed 
magnitude of a galaxy mo is affected by the magnihcation fi{r). It is clear that the 
simplihed method outlined here is not the only way of applying Bayesian probability to 
cluster mass reconstruction. My purpose here is to show that this can be done considering 
the photometric redshifts in a integrated way with the rest of the information. 


6.4. Galaxy evolution and cosmological parameters 


As it has been shown in section (^), BPZ can be used to estimate the parameters 
characterizing the joint magnitude-redshift-morphological type galaxy distribution. For 
small helds, this distribution may be dominated by local perturbations, and the redshift 
distribution may be ‘spiky’, as it is observed in redshift surveys of small helds. However, 
if one were to average over a large number of helds, the resulting distribution would 
contain important information about galaxy evolution and the fundamental cosmological 
parameters, bandage 196 1| included galaxy counts as one of the four fundamental tests of 
observational cosmology, although noting that the number-redshift distribution is in fact 
more sensitive to the value of flo- As [Gardner 1998 also notes, the color distribution of the 
galaxies in a survey hold also much more information about the process of galaxy evolution 
that the raw number counts. However, quite often the only method of analyzing multicolor 
observations is just comparing them with the number counts model predictions, or at most, 
with color distributions. There are several attempts at using photometric redshifts to 
study global galaxy evolution parameters (e.g. jSawicki, Lin, fc Yee 1997| , Ponnolly et ahj 
|1997|) , but so far there is not an integrated statistical method which would simultaneously 
considers all the information, magnitudes and colors, contained in the data, and set it 
against the model predictions. 


It is then clear that eq. (^) can be used to estimate these parameters from large 
enough samples of multicolor data. If it is assumed that all the galaxies belong to a few 
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morphological types, the joint redshift-magnitude-‘type’ distribution can be written as 

n{z,mo,T) oc (42) 

dz 

where V{z) is the comoving volume as a function of redshift, which depends on the 
cosmological parameters and Hq, and (I)t is the Schecter luminosity function for 

each morphological type T, where the absolute magnitude Mq has been substituted 
by the apparent magnitude rriQ (a transformation which depends on the redshifts, 
cosmological parameters and morphological type). Schecter’s function also depend on 
M*, a and (j)*, and on the evolutionary parameters e, such as the merging rate, the 
luminosity evolution, etc. Therefore, the prior probability of z,A and mo depends on 
the parameters = {hlo) and e. As an example, let’s suppose 

that one wants to estimate e, independently of the rest of the parameters, given the data 
D = {Di} = {Ci,moi}. Then 

p(e|D) = J dXcdX^p{e, Ac, A*|D) (43) 

p(e|D) oc / dAcdA*p(e, Ac, A*) / dziJ2pi^i:T,moi\e, Xc, X^)p{Ci\zi,T) (44) 

J i T 

The prior ^(zj, T, moi|e, Ac, A*) can be derived from n{z,mo,T) in eq. (^). The prior 
p{e , Ac, A*) allows to include the uncertainties derived from previous observations or theory 
in the values of these parameters, even when they are strongly correlated among themselves, 
as in the case of the Schecter function parameters A*. The narrower the prior p{e, Ac, A*) 
is, the less ‘diluted’ the probability of e and the more accurate the estimation. 


7. Conclusions 

Despite the remarkable progress of faint galaxy spectroscopical surveys, photometric 
redshift techniques will become increasingly important in the future. The most frequent 
approaches, the template-htting and empirical training set methods, present several 
problems related which hinder their practical application. Here it is shown that by 
consistently applying Bayesian probability to photometric redshift estimation, most of those 
problems are efficiently solved. The use of prior probabilities and Bayesian marginalization 
allows the inclusion of valuable information as the shape of the redshift distributions 
or the galaxy type fractions, which is usually ignored by other methods. It is possible 
to characterize the accuracy of the redshift estimation in a way with no equivalents in 
other statistical approaches; this property allows to select galaxy samples for which the 
redshift estimation is extremely reliable. In those cases when the a priori information is 
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insufficient, it is shown how to ‘calibrate’ the prior distributions, using even the data under 
consideration. In this way it is possible to determine the properties of individual galaxies 
more accurately and simultaneously estimate their statistical properties in an optimal 
fashion. 

The photometric redshifts obtained for the Hubble Deep Field using optical and near-IR 
photometry show an excellent agreement with the ~ 100 spectroscopic redshifts published 
up to date in the interval 1 < z < 6, yielding a rms error Azb/{1 + z^pec) = 0.08 and no 
outliers. Note that these results, obtained with an empirical set of templates, have not 
been reached by minimizing the difference between spectroscopic and photometric redshifts 
(as for empirical training set techniques, which may lead to an overestimation of their 
precision) and thus offer a reasonable estimate of the predictive capabilities of BPZ. The 
reliability of the method is also tested by estimating redshifts in the HDF but restricting 
the color information to the UBVI hlters; the results are shown to be more reliable than 
those obtained with the existing techniques even including the near-IR information. 

The Bayesian formalism developed here can be generalized to deal with a wide range 
of problems which make use of photometric redshifts. Several applications are outlined, 
e.g. the estimation of individual galaxy characteristics as the metallicity, dust content, etc., 
or the study of galaxy evolution and the cosmological parameters from large multicolor 
surveys. Finally, using Bayesian probability it is possible to develop an integrated statistical 
method for cluster mass reconstruction which simultaneously considers the information 
provided by gravitational lensing and photometric redshift estimation. 


I would like to thank Tom Broadhurst and Rychard Bouwens for careful reading the 
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Table 1. Parameters of the priors p{z\T,mo){see text) 


Spectral type 

Oit 

Zot 

kmt 

E/SO 

2.26 ±0.05 

0.48 ±0.03 

0.061 ±0.06 

Sbc,Scd 

1.71 ±0.04 

0.44 ±0.02 

0.044 ±0.002 

Irr 

1.125 ±0.015 

0.038 ±0.01 

0.178 ±0.002 






